Branch error reduction criterion-based signal recursive decomposition and its application to wind power generation forecasting

Due to the ability of sidestepping mode aliasing and endpoint effects, variational mode decomposition (VMD) is usually used as the forecasting module of a hybrid model in time-series forecasting. However, the forecast accuracy of the hybrid model is sensitive to the manually set mode number of VMD; neither underdecomposition (the mode number is too small) nor over-decomposition (the mode number is too large) improves forecasting accuracy. To address this issue, a branch error reduction (BER) criterion is proposed in this study that is based on which a mode number adaptive VMD-based recursive decomposition method is used. This decomposition method is combined with commonly used single forecasting models and applied to the wind power generation forecasting task. Experimental results validate the effectiveness of the proposed combination.


Introduction
The increasing frequency of human activities and rapid development of the social economy increase electricity demand, which drives the growth of the global power generation industry.In order to meet future energy and power demand, adapt to changes in energy supply and demand and environmental situation, and realize long-term sustainable development, it is urgent to strengthen the development and utilization of clean and renewable energy [1].The utilization and exploration of renewable energy power generation will be one of the important issues in the power industry in the future [2].Because new energy generation is strongly affected by environmental factors, the time series of power generation is intermittent and volatile, which is not conducive to the stable operation and rational planning of the power system [3].Therefore, it is critical to develop an effective time series model for power generation forecasting.
Currently, time series models can be roughly divided into three categories: classical statistical models, machine learning models, and hybrid models.The ARIMA model is one of the typical representatives of classical statistical models and has been widely used in load forecasting [4].Commonly used machine learning models include the back propagation (BP) neural network [5], long short-term memory (LSTM) [6], and support vector network (SVM) [7].
Compared to single models, the hybrid model has better performance when solving complex time series forecasting issues.Hybrid models can be further classified into two subtypes.
The first subtype integrates the forecasting results of several single models.Zhang et al. [8] used a genetic algorithm (GA) to optimize the parameters of support vector regression (SVR) in the time series forecasting task.Choi et al. [9] combined CNN and BiLSTM together to handle the strong long memory serial dependence feature of the dataset.The second subtype decomposes the time series into subsignals and sums the forecasting results of the subsignals.Bai et al. [10] decomposed the time series using the wavelet transform (WT) [11] and forecasted future air pollutant concentration measurements with a BP neural network.Zheng et al. [12] and Chen et al. [13] used empirical mode decomposition (EMD) [14] to decompose the electric load and combined LSTM and extreme learning machine (ELM).Qin et al. [15] combined ensemble EMD (EEMD) [16] with local polynomial prediction (LPP) as the final model for the forecasting task.Lv et al. [17] decomposed time series using VMD and used LSTM to forecast power load.Cai et al. [18] proposed a combination of VMD, gated recurrent unit (GRU) and time convolutional network (TCN) to achieve a satisfactory power load forecasting result.
The time series of electricity generation is generally a broadband signal, and its future trend is not stable.Therefore, it is difficult to approximate the relationship between historical measurements and its future changes.The future trend of a narrowband signal is normally considered to be more stable.Therefore, the second type of hybrid model is used to decompose the time series of power generation into narrowband modes, and the final forecasted results are obtained by summarizing the forecasted results of each mode.Among the decomposition methods, WT is nonadaptive, and the selection of the optimal wavelet basis strongly affects the decomposition results.EMD suffers from mode aliasing and endpoint effects.EEMD can overcome the shortcomings of EMD to a certain extent but still requires complex calculations and incompletely neutralizes white noise.VMD [19] is a nonrecursive and robust signal decomposition method and has a solid theoretical basis and circumvents the disadvantages of similar methods.
Therefore, VMD is the best choice for the signal decomposition module in hybrid models.However, VMD is sensitive to the mode number, which must be set manually.A mode number that is too small leads to underdecomposition of the signal, while a mode number that is too large results in overdecomposition.Both underdecomposition and overdecomposition decrease the forecast accuracy of the hybrid model.Therefore, it is important to adaptively determine the optimal mode number for VMD.In many studies, the mode number was adaptively aligned with the number determined by EMD [20][21][22], but this method cannot mitigate the negative impact of modal aliasing on forecast accuracy.Huang et al. [23] used a genetic algorithm to optimize VMD parameters to reduce the decomposition loss, but the addition of a new algorithm makes the problem complex and inefficient.
To address this issue, we design a branch error reduction criterion, upon which a VMDbased recursive decomposition method with adaptive mode number is proposed.
The primary contributions of this study are as follows: • The BER criterion is designed, and we show that a subsignal that is further decomposed leads to better forecasting accuracy if this subsignal satisfies the BER criterion.
• A mode number adaptive VMD-based recursive decomposition method is proposed.The hybrid model that combines the proposed decomposition method and commonly used forecasting single model is used to fulfill the wind power generation forecasting task.
• Experimental results validate that the proposed VMD-based recursive decomposition method can effectively extract the fluctuation patterns of wind power and improve the forecast accuracy.
The remainder of this paper is organized as follows: Principles of VMD and permutation entropy; Derivation of the branch error reduction criterion and the VMD-based recursive decomposition method; Experiments that verified the effectiveness of the proposed method; Summary of the paper.

VMD
VMD assumes that all components are narrowband signals concentrated around their respective center frequencies; thus, VMD constructs constrained optimization problems based on narrowband conditions of components to estimate the center frequency of subsignals and reconstruct corresponding components [12].
Under the assumption that the original signal f(t) is decomposed into K subsignals and the decomposition sequence is guaranteed to be a mode component with a finite bandwidth around a central frequency, the sum of the estimated bandwidth of each mode is minimized, and the constraint is that the sum of all modes is equal to the original signal.Then, the variational model can be formulated as: ( ) where u k (t) is the mode function; ω k is the mode center frequency; K is the number of modes; δ is the Dirac function; * is the convolution calculator; and f(t) is the input signal.The Lagrange multiplier λ(t) and the quadratic penalty factor α are introduced to transform the constrained algorithm into an unconstrained variational problem: The optimal solution of the variational problem is obtained by iteratively updating u nþ1 k ðtÞ, o nþ1 k ðtÞ and l nþ1 k ðtÞ using the alternating direction method of the multipliers.In this study, the iterative process of the Fourier transform of u k (t), ω k and λ(t) can be expressed as: where η is the noise tolerance of the signal.
As a decomposition algorithm widely used in signal processing, VMD effectively overcomes the problems of mode aliasing and endpoint effects; thus, it is often combined with forecasting models to form hybrid models for time series forecasting.The number of decomposition modes is a key parameter that must be set manually when using VMD, which is critical to the forecasting results.For example, daily power generation is the result of multiple factors, and its time series is a combination of several modes with different vibration frequencies.Underdecomposition of daily power generation series cannot accurately separate each mode, resulting in the overlap of modes with different fluctuation patterns, which affects the accuracy of the final results, while in the case of overdecomposition, the increase in the mode number corresponds to the growth of computation and training time, which damages or even cancels the advantage of the stable future trend of some subsignals.Therefore, the effective determination of the optimal mode number of VMD becomes a critical problem to solve.Table 1 shows the final forecasting errors obtained by decomposing the daily power generation data in 2020 according to different mode numbers under the three single forecasting models.
Table 1 shows that the forecasting error varies strongly with an increase in the number of modes, which highlights the importance of the choice of the mode number for accurate forecasting.The trend of the variation in the normalization error corresponding to the three hybrid models is shown more intuitively in Fig 1, which means that the optimal mode number differs for different forecasting models, and this parameter cannot be derived empirically or by simple data processing.Therefore, a method that can effectively determine the optimal mode number corresponding to different forecasting models is important to develop.

Permutation entropy
During the process of VMD signal decomposition, a residual fraction (RF) will be generated that contains more random noise but may also have part of the information of the original series; thus, the permutation entropy (PE) criterion is considered as the basis for filtering the residual component in this paper.PE was proposed by Bandt et al. [24] in 2002 to detect the randomness of time series and is suitable to analyze nonstationary signals with good robustness.The entropy of a signal determines its random degree: the larger the entropy, the more random the signal is.Therefore, RF can be detected by PE.The calculation steps are as follows.
We denote a time series as S = {s(1), s( 2), � � �, s(n)} and obtain the matrix after space reconstruction: where τ is the delay time and m is the embedding dimension.Rearranging the reconstructed matrix in ascending order yields: where j 1 , j 2 , � � �, j m are the index values of elements in the reconstructed component.For any segment s i , a set of symbol sequences {j 1 , j 2 , � � �, j m } can be obtained; thus, there are different symbol sequences mapped from dimensional space.Calculating the probability of each symbol sequence, PE can be defined as: where PE reaches its maximum ln(m!) when P j = 1/m!.In the real process, normalization is usually performed: PE thus describes the randomness of the time series.By calculating the entropy of RF, the components with a larger proportion of noise are eliminated to reduce random noise.

VMD-based signal recursive decomposition
Based on the shortcomings of VMD, we propose a mode number adaptive VMD-based recursive decomposition method, which is expected to automatically calculate the corresponding optimal mode number when combining different forecasting models.

Branch error reduction criterion
BER uses the mean absolute error (MAE) as an expression of test error and determines whether further decomposition is required.If the sum of the subsignals' MAE is less than the MAE before decomposition, the decomposition is meaningful.
The criterion is based on the following theorem: Theorem 1.The total testing error decreases when the sum of the errors of the child branches is smaller than the error of the parent branch. Proof: is the kth subsignal of V and its MAE is e (k) ; V (k,q) is the qth subsignal of V (k) and its MAE is e (k, q) .If the testing error before and after decomposition satisfies: X q e ðk 0 ;qÞ < e ðk 0 Þ ð10Þ then it can also be expressed as: We thus extend Eq (11) as: Expressing Eq (12) in another form: where V ðkÞ t and V ðk 0 ;qÞ t are the forecasting values of and, respectively.This derivation shows that if Theorem 1 is satisfied, the total testing error after decomposition is reduced, which proves that the decomposition is meaningful.

BER-based decomposition
Based on Theorem 1, we propose a recursive signal decomposition method based on BER.The specific decomposition process of this method is as follows.
Step 1: The original time series is decomposed with VMD at the first level, the number of decompositions is set to, and the subsignals are input into each forecasting model to obtain the corresponding testing error.
Step 2: Decompose and forecast the subsignals in the next level and set the number of decompositions to.
Step 3: Check if the error before and after decomposition satisfies the BER criterion.If the error decreases, the decomposition is retained, and the operation is repeated in Step 2; otherwise, the decomposition is invalid and terminates.
A flow diagram of the decomposition method is shown in Fig 2.
Wind power generation data are decomposed for the first time by VMD, and the mode number K 0 is obtained via a simple calculation and is a small value in the range of the optimal mode number of different forecasting models.K 0 is selected to ensure the initial decomposition of the data without overdecomposition.In this study, is set to 2. In addition, the subsignals of the final output are through different levels of decomposition in most cases.
The hybrid model used in this study is shown in Fig 3.In this model, the BER-based decomposition method is used as the signal decomposition module to first decompose the historical wind power generation data into subsignals adaptively.Then, these subsignals are fed into the different models to realize the forecasting task, and finally, the forecast results of the subsignals are superimposed to obtain the predicted power generation.In the hybrid models used in this study, the forecasting models can use statistical, machine learning, or deep learning models such as LSSVM, ELM, DBN, LSTM, etc.In addition, VMD decomposes the signal into multiple narrowband components and a residual component, which may contain the high-frequency component of the original signal; thus, directly discarding the residual component may cause the loss of the high-frequency component and affect the forecast accuracy.Thus, we choose PE as a measure of signal randomness and set a threshold θ = 1 to filter the residual components of each decomposition.

Power generation datasets
In this example, the historical data of wind power generation in Fujian Province from 2020 to 2021 are selected for the experiment.The sampling interval of this dataset is 1 h, with a total of 731 wind power generation datasets, as shown in Fig 4 .The first ten months are selected as the training set, and the last two months are selected as the testing set.The wind power generation datasets in 2021 show an overall increase compared to 2020 and are primarily reflected in the three seasons of spring, summer, and autumn; the standard deviation in 2021 is relatively small and stable overall.In addition, the seasonal differences within a year are strong, showing high power generation in summer and autumn, and low  power generation in spring and autumn; the standard deviation in winter is maintained at a low level, while the variation in spring is more intense and random.

Evaluation indicators
Error indicators.In this study, the mean absolute error (MAE), root mean square error (RMSE), and mean absolute percentage error (MAPE) are selected as evaluation indicators: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 N where N is the length of the series, V a t is the real series, and V f t is the forecasting series.Improvement indicators.To evaluate the improvement of the proposed decomposition method compared with other methods, the following formula is used as improvement indicators based on the above error indicators: where P is the percentage of error reduction, e 1 and e 0 represent the error of the proposed method and comparative experiment, respectively.

Comparative experiments
Decomposition according to the center frequency.There is still a lack of general guidelines for the selection of the mode number [25].Among the traditional methods of determining the mode number, the more intuitive and simple method is to observe whether the center frequency is aliased [26].The size of K is increased from K = 2 to observe the distribution of the center frequency.The center frequencies for different values of K are shown in Table 3.
Table 3 shows that when the mode number is above 5, the center frequency of the last mode component always remains relatively stable.If K is continuously increased, the more layers of decompostion there are, the smaller the interval of the center frequencies of each component will be, and the more likely it is to generate additional noise components because the center frequency of the last layer remains unchanged.Thus, the optimal value of the mode number for the first decomposition is 5. Fig 6 shows the decomposed signal curves of the daily power generation data in 2020 and 2021, where IMF1-IMF5 are the narrowband components and RF is the residual component.From top to bottom, the curve vibration frequency becomes increasingly intense and irregular.
Similarly, a second decomposition is performed for all components according to the central frequencies to obtain the central frequencies at K = 2, 3, 4. The central frequencies obtained by performing this operation for the power generation datasets in 2020 and 2021 are shown in Table 4.The values of K for the second decomposition are 3, 3, 3, 3, 1, and 3 in 2020, and 3, 2, 3, 2, 1, and 3 in 2021.The central frequencies of the fifth component are already aliased at K = 2; thus, the second decomposition is not performed.
Decomposition according to BER criterion.In a deeper decomposition of the original wind power generation series based on BER, the final decomposition results differ markedly from the decomposition according to the central frequency, and different forecasting models correspond to different optimal decompositions.Taking LSSVM as the forecasting model as an example, the decomposition process of the power generation series in 2020 is shown in Fig 7.
With LSSVM as the forecasting model, four components can be decomposed for the second time, and eleven subsignals can be decomposed for the third time.According to the same decomposition process, the optimal decomposition is performed under the forecasting models of ELM, LSTM, and DBN, and the percentage decrease of the error sum corresponding to the latter two decompositions is shown in Table 5.With the gradual depth of decomposition, the error sum decreases, satisfying the BER criterion.From this data, RF is shown to produce the largest prediction variations and testing errors.RF contains more noise but also preserves some information about the original series; thus, we measure whether the RF and the subsignals obtained from the second decomposition of RF should be retained by PE.Table 6 shows the PE of the above components and compares it with the entropy of the five narrowband components acquired from the first decomposition, which is referred to as avg.IMF.
Table 6 shows that although most of the values of several residual components are greater than the avg.IMF, they do not exceed the threshold; thus, no component must be removed.Also, after the second decomposition of the residual components, the PE of the first two  secondary components of the decomposition is significantly reduced, tending to the range of the narrowband components, while the PE of the secondary residual components has increased to a certain extent, indicating that the noise components are more obviously separated after the second decomposition.In the experiments, the secondary residual component with the largest value of PE is removed as an independent comparison experiment.
Thirteen comparison experiments of different decomposition patterns are conducted for each forecasting model separately, all of which divided the original series into five narrowband components and one residual component at the first level of decomposition.The experiments are divided into five groups, and the variables between groups are the number of decomposition layers and basis; the variables within groups are the treatment methods of the residual components.In the last group of experiments, a comparison experiment of removing secondary residual components is conducted.Therefore, when comparisons are made between groups, the best-performing type of experimental data within the group is selected in all cases:  A. Direct secondary decomposition • Decomposing the six components according to K = 2.
• Decomposing the six components according to K = 3.
B. Secondary decomposition according to the central frequency • Decomposing five narrowband components according to the center frequency, the residual components are not subjected to the second level of decomposition.
• Decomposing five narrowband components according to the center frequency, the residual components are decomposed according to K = 2.
• Decomposing five narrowband components according to the center frequency, the residual components are decomposed according to K = 3.
C. Secondary decomposition according to the BER criterion • Five narrowband components are decomposed for a second time according to the BER criterion, and the residual components are not subjected to the second level of decomposition.
• Five narrowband components are decomposed for a second time according to the BER criterion, and the residual components are decomposed according to K = 2.
• Five narrowband components are decomposed for a second time according to the BER criterion, and the residual components are decomposed according to K = 3.

D. Third decomposition according to the BER criterion
• Five narrowband components are decomposed three times according to the BER criterion, and the residual components are not subjected to the second level of decomposition.
• Five narrowband components are decomposed three times according to the BER criterion, and the residual components are decomposed according to K = 2.
• Five narrowband components are decomposed three times according to the BER criterion, and the residual components are decomposed according to K = 3.
• Decomposing all components three times according to the BER criterion.

E. Third decomposition and removal of secondary residual components
• Based on the optimal decomposition, the secondary residual components are removed.
Experimental results and analysis.1.Comparison Experiment I Group A and Group B are measured by three error indicators and compared with the first level of decomposition.The forecasting results of each model are shown in Table 7.
The results in Table 7 show the following: • The experimental errors of direct secondary decomposition and secondary decomposition according to the central frequency are better than those of first-level decomposition in most instances, but there are also cases where the accuracy is extremely poor.These results indicate that deeper levels of decomposition do not equate to more accurate results.
• In the experiments conducted on both the 2020 and 2021 datasets, the mode number of Group B is more than that in Group A. In the experiments based on LSTM, the error of Group B has markedly decreased compared with that of Group A. However, in the experiments based on other models, the advantage of Group B is not large, and there is even one large error.Thus, more modes do not equate to less error with the same decomposition levels.In addition, both decomposition methods have certain drawbacks and great limitations in reducing the forecast accuracy.

Comparison experiment II
In the experiments based on the BER criterion, only some subsignals satisfy the condition of the third decomposition.Therefore, in addition to the experimental results of Group C and Group D, another comparison experiment is set up for all the subsignals obtained by the second decomposition to be decomposed for the third time to verify the accuracy and superior performance of the recursive decomposition method based on the BER criterion.Experimental results are shown in Table 8.
The errors of Group D and the comparison data in Table 8 show that a deeper decomposition without satisfying the BER criterion is likely to cause an error explosion.The error comparing Group C and Group D following the BER criterion with the first-level decomposition is shown in Table 9 and Fig 8 .Experimental error decreases to some extent for each level of decomposition for all forecasting models.Combined with the previous conclusion that the number of decomposition levels and modes is not proportional to the reduction in forecasting error, the superiority of the BER criterion in decomposing time series is further shown.

Comparison experiment III
To describe the influence of the residual components on the forecast accuracy, the secondary residual component with the largest PE is chosen to be discarded as a comparison experiment.The relative percentage decrease in the error for Group E compared with Group D is shown in Table 10.
As shown in Table 10, the errors of the four groups of experiments decreased to a certain extent after the removal of the secondary residual components.• To achieve better separation for higher forecasting accuracy, blind decomposition is undesirable.Neither deep decomposition levels nor a large number of modes is equivalent to a small error.
• Traditional decomposition methods are more random in terms of validity and effectiveness, which is inappropriate as a criterion for judging the mode number.
• Recursive decomposition based on BER has a complete mathematical derivation process and shows stability in the real forecasting process, which is more objective than other decomposition methods.

Conclusions
In this paper, we propose a recursive decomposition method based on the branch error reduction criterion to decompose wind power generation into more regular and easily trained  multiple modes.Four forecasting models, LSSVM, ELM, DBN, and LSTM, are used for forecasting, and the superior performance of the proposed decomposition method is primarily shown in the following results: • Taking full advantage of VMD, the proposed method can decompose the original time series into subcomponents with more easily captured features.
• The branch error reduction criterion is supported by mathematical theory, which improves the reliability and robustness of the overall model.
• The ambiguous judgment method is abandoned, and a mathematical guideline is used to facilitate program integration and modular design of signal decomposition.
Because the trend of power generation and the influencing factors change over time, the distribution of the dataset used for training is not consistent with the new data, resulting in the previous model not being able to forecast the present data at a high level of accuracy, which means there is a distribution drift phenomenon.To improve model generalizability and make the distribution of training and testing data as consistent as possible, the distribution drift will be improved in the future based on the existing research using the sample weighting strategy to improve prediction accuracy.

Fig 7 .
Fig 7. LSSVM decomposition process.https://doi.org/10.1371/journal.pone.0299955.g007 Fig 9 shows the average percentage of MAE deletion.The experiments with LSSVM as the forecasting model show larger decreases after removing the residual components, indicating that the decomposition of wind power generation series is more accurate in this experiment, and the separation of noise in the series is more successful.Summary.These comparative experiments and data results show the following: